{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Building RAG (LangChain) Using Yi's API\n",
    "\n",
    "In this tutorial, we will learn how to build a Retrieval-Augmented Generation (RAG) system using LangChain and Yi's API."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Installing Necessary Libraries\n",
    "\n",
    "First, we need to install the LangChain library."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "!pip install langchain"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setting Up Environment Variables\n",
    "\n",
    "Next, we need to configure the LangSmith and Yi API keys. Please ensure you have registered and obtained API keys from [LangSmith](https://smith.langchain.com/) and [Yi Open Platform](https://platform.01.ai/apikeys)."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "import getpass\n",
    "import os\n",
    "\n",
    "os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
    "os.environ[\"LANGCHAIN_API_KEY\"] = \"your_langsmith_api_key\"\n",
    "os.environ[\"YI_API_KEY\"] = \"your_yi_api_key\""
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Installing LangChain-OpenAI Library\n",
    "\n",
    "We also need to install the `langchain-openai` library."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "!pip install -qU langchain-openai"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Configuring the LLM\n",
    "\n",
    "Now let's configure the LLM using Yi's large language model."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from langchain_openai import ChatOpenAI\n",
    "\n",
    "llm = ChatOpenAI(\n",
    "    base_url=\"https://api.01.ai/v1\",\n",
    "    api_key=os.environ[\"YI_API_KEY\"],\n",
    "    model=\"yi-large\",\n",
    ")"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading Data\n",
    "\n",
    "Next, we will load some sample data. Here, we use LangChain's WebBaseLoader to load web page data."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "import bs4\n",
    "from langchain_community.document_loaders import WebBaseLoader\n",
    "\n",
    "loader = WebBaseLoader(\n",
    "    web_paths=(\"https://lilianweng.github.io/posts/2023-06-23-agent/\",),\n",
    "    bs_kwargs=dict(\n",
    "        parse_only=bs4.SoupStrainer(\n",
    "            class_=(\"post-content\", \"post-title\", \"post-header\")\n",
    "        )\n",
    "    ),\n",
    ")\n",
    "docs = loader.load()"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Building a Vector Database\n",
    "\n",
    "We will use HuggingFace's embedding model to build a vector database and use Chroma to store the vectors."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from langchain.embeddings import HuggingFaceEmbeddings\n",
    "from langchain_chroma import Chroma\n",
    "from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
    "\n",
    "# Load the embedding model\n",
    "embedding = HuggingFaceEmbeddings(model_name=\"BAAI/bge-base-en-v1.5\")\n",
    "\n",
    "# Split the documents\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)\n",
    "splits = text_splitter.split_documents(docs)\n",
    "\n",
    "# Build the vector database\n",
    "vectorstore = Chroma.from_documents(documents=splits, embedding=embedding)\n",
    "retriever = vectorstore.as_retriever()"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Building the RAG Chain\n",
    "\n",
    "Finally, we will build the RAG chain and use LangChain's hub to get the prompt."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from langchain import hub\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "from langchain_core.runnables import RunnablePassthrough\n",
    "\n",
    "prompt = hub.pull(\"rlm/rag-prompt\")\n",
    "\n",
    "def format_docs(docs):\n",
    "    return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
    "\n",
    "rag_chain = (\n",
    "    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
    "    | prompt\n",
    "    | llm\n",
    "    | StrOutputParser()\n",
    ")\n",
    "\n",
    "response = rag_chain.invoke(\"What is Task Decomposition?\")\n",
    "print(response)"
   ],
   "execution_count": null,
   "outputs": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
